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I. ABSTRACT 


During the lifecycle of many single-stranded RNA viruses, including many human pathogens, a protein shell called 
the capsid spontaneously assembles around the viral genome. Understanding the mechanisms by which capsid pro¬ 
teins selectively assemble around the viral RNA amidst diverse host RNAs is a key question in virology. In one 
proposed mechanism, sequence elements (packaging sites) within the genomic RNA promote rapid and efficient as¬ 
sembly through specific interactions with the capsid proteins. In this work we develop a coarse-grained particle-based 
computational model for capsid proteins and RNA which represents protein-RNA interactions arising both from non¬ 
specific electrostatics and specific packaging sites interactions. Using Brownian dynamics simulations, we explore how 
the efficiency and specificity of assembly depend on solution conditions (which control protein-protein and nonspecific 
protein-RNA interactions) as well as the strength and number of packaging sites. We identify distinct regions in 
parameter space in which packaging sites lead to highly specific assembly via different mechanisms, and others in 
which packaging sites lead to kinetic traps. We relate these computational predictions to in vitro assays for specificity 
in which cognate viral RNAs are compete against non-cognate RNAs for assembly by capsid proteins. 
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II. INTRODUCTION 

In many single-stranded RNA virus families, the spontaneous assembly of a protein container (capsid) around the 
viral RNA is an essential step in the viral life cycle [T]. Formation of an infectious virion requires that the assembling 
proteins select the viral RNA out of the milieu of cellular RNA, and most viruses do so with high specificity (e.g. 
99% [2]) in vivo. Understanding the mechanisms which enable such specific co-assembly could guide the design of 
delivery vectors that assemble around specific drugs or genes, and could identify targets for antiviral agents that 
interfere with genome packaging. In this work, we use dynamical computer simulations to investigate the ability of 
sequence-specific RNA-protein interactions (packaging sites) to drive specific packaging of the viral genome, and how 
specificity depends on the underlying sequence-independent interactions. 

A key driving force for RNA-capsid protein co-assembly is provided by electrostatic interactions between RNA 
phosphate groups and basic amino acids, often located in flexible tails known as arginine rich motifs (ARMs) (e.g., 
0 ). These nonspecific interactions are sufficient for assembly, as shown by the ability of capsid proteins to assemble 
in vitro around heterologous RNA, synthetic polyelectrolytes, and other negatively charged substrates [Unj. In 
vitro assembly assays [TH] and computational modeling nano] indicate that the charge and structure arising from 
base pairing of viral RNAs is optimal for assembly by their capsid proteins. However, these physical characteristics 
alone cannot explain the remarkably specific packaging of the viral genome achieved by many RNA viruses in vivo. 
Several factors have been proposed to explain specific packaging in vivo, including subcellular localization of viral 
components [21], coordinated translation and assembly [22ll24] . and NA-sequence-specific interactions between capsid 
proteins and sites within the genome called packaging sites (PSs). PSs have been identified for a number of unrelated 
viruses infecting plant, animal, or bacterial hosts, suggesting this mechanism has widespread relevance [25H33] . 

The specificity conferred by PSs has been explored through in vitro experiments, either by comparing assembly 
yields of capsid proteins around cognate and non-cognate RNAs in separate experiments or by competition assays, 
in which two RNA species compete for packaging under limiting protein concentrations. Measured selectivities have 
varied widely, ranging from high selectivity for the cognate |5ni|5Tl|5i]. no selectivity [35], or selectivity for a non¬ 
cognate RNA[TH]. Two recent experiments observed that assembly around cognate RNAs proceeded via different, 
faster assembly pathways than around non-cognate RNAs [36ll37]. The authors suggest that their experiments are 
more selective for cognate RNAs because they use a lower protein concentration than previous experiments (1/iM vs 
10/iM). 

Using chemical kinetics simulations (Gillespie algorithm [38]-[40]), Dykeman et al. [22][4T] predicted assembly under 
dynamic subunit concentrations, i.e. the concentration increase (‘ramp’) that occurs during an infection cycle in E. 
coli, could lead to 100% specificity for RNAs with PSs (represented by nonuniform protein binding affinities) even 
under a large excess of non-cognate RNAs (represented by uniform binding affinities). In contrast, constant subunit 
concentrations led to weak differences in yield (^ 5%) and a significant portion of malformed capsids. However, 
these simulation results do not entirely address the recent in vitro experiments [31137] in which PSs led to high 
yield assembly while non-cognate assembly was unsuccessful using constant subunit concentrations. A limitation 
of Gillespie algorithm simulations is that the state space (the set of allowed partial capsid geometries and RNA 
configurations) and the transition rates (e.g. association rates among RNA-bound subunits) must be assumed a 
priori [Tj. It is therefore difficult to account for complex processes such as cooperative RNA-protein motions seen in 
previous Brownian dynamics simulations [42] [43] . While these assumptions can be guided by experimental data in 
certain cases, we seek here to determine the ensemble of possible assembly pathways and products. 

We recently developed a particle-based computational model for RNA and capsid proteins da m with which 
capsid assembly is simulated using Brownian dynamics. Although the model is coarse-grained, model predictions for 
RNA lengths that optimize capsid thermostability quantitatively agreed with viral genome length for seven viruses 
m- We previously examined how varying the nonspecific electrostatic RNA-protein subunit interactions, solution 
conditions, and subunit-subunit interactions leads to a range of assembly outcomes and different classes of assembly 
pathways [44] . 

Here, we explore how introducing specific PS interactions, in a simple form inspired by a recent structural investiga¬ 
tion of STNV [37], alters these assembly pathways and products. By extensively comparing assembly around uniform 
polyelectrolytes (representing non-cognate RNA) and PS-containing polyelectrolytes (cognate RNA), we identify so¬ 
lution conditions that lead to highly specific packaging of the cognate RNA. Depending on the relative strength of 
protein-protein and protein-RNA interactions, we find that PSs can drive specific assembly via several mechanisms. 
Gonsistent with recent single molecule experiments [36], the simulations indicate that PSs can trigger assembly via 
pathways with more compact intermediates as compared to non-cognate RNAs. However, we also find solution con¬ 
ditions under which PSs are unable to drive specific packaging or even lead to kinetic traps. We then investigate how 
assembly yields and specificity depend on the number and strength of PSs. In general, we find that a combination 
of one high affinity PS and multiple weak PSs leads to the highest assembly yields, consistent with the identification 
of multiple weak PSs in viral genomes ESI and with previous observations that productive self-assembly reactions 
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require reversible interactions [U [45]. We conclude by discussing potential experimental predictions suggested by 
these simulations. 


III. MODEL 


To study the effect of PSs on assembly, we have extended a recently developed model [13 mi for assembly around 
linear polyelectrolytes and non-cognate RNAs to include a representation of PSs. The model is motivated by recent 
experiments in which purified simian virus 40 (SV40) capsid proteins assemble in vitro around ssRNA molecules to 
form virus-like particles composed of 12 homopentamer subunits [46l|47|. The model capsid is therefore a dodecahedron 
comprising 12 pentagonal subunits, each of which represents a homopentamer of the capsid protein. It is assumed 
that homopentamers are stable and form rapidly in solution, as is the case for for SV40. Although the structure of the 
model capsid is motivated by these experiments [4^ [47] , in this article we use the model to study general relationships 
between PSs and assembly which could apply to many viral species. 

Capsid protein subunit-subunit interactions. Following Refs [I3I11I1H], model subunits are attracted to 
each other via attractive pseudoatoms, ‘attractors’ (type ‘A’) at the vertices, which interact via a Morse potential (see 
Fig. 0 and the Methods section). The subunit-subunit interaction strength is controlled by the model parameter e^s] 
the free energy of subunit dimerization is ^ss/^bT= 5.0 — l.S^ss* This does not include the effects due t o rep ulsions 
between ARMs (defined below), which we estimate to reduce by ^ 0.5 /cbT at lOOmM; see SI section SIC These 


attractions represent the interactions between capsid protein subunits that arise from hydrophobic, van der Waals, and 
electrostatic interactions m), whose strength can be experimentally tuned by pH and salt concentration [Biiaisoi. 

Pairs of subunits are driven toward a preferred subunit-subunit angle consistent with a dodecahedron (116 degrees) 
by repulsive ‘Top’ pseudoatoms (type ‘T’), which interact via the repulsive term of the Lennard-Jones (LJ) potential. 
The ‘Bottom’ pseudoatoms (type ‘B’) have a repulsive LJ interaction with ‘T’ pseudoatoms, intended to prevent 
‘upside-down’ assembly. The ‘T’, ‘B’, and ‘A’ pseudoatoms form a rigid body [MllSIllSa. See Refs. [40l|42l|52H67] 
for related models. 

Sequence-independent electrostatic interactions. Capsid assembly around nucleic acids and other poly¬ 
electrolytes is driven by electrostatic interactions between negative charges on the encapsulated polyelectrolyte and 
positive charges on capsid protein-RNA binding domains mia .To account for these interactions, we extend the model 
as follows. First, we add positively charged bead-spring polymers affixed to the inner surface of the subunit, to repre¬ 
sent the highly charged, fiexible terminal tails known as arginine rich motifs (ARMs) that are typical of positive-sense 
ssRNA protein-RNA binding domains (e.g., 0 ). There are five ARMs per pentameric subunit. For each ARM, the 
first segment is anchored at a fixed position on the subunit, midway between the subunit center and a vertex. Except 
where stated otherwise, each ARM contains five segments of charge +e. To better represent the capsid shell, we 
include a layer of ‘Excluder’ pseudoatoms, which have a repulsive LJ interaction with the RNA and the ARMs. The 
‘Excluders’ and first ARM segment are part of the subunit rigid body. ARM beads interact through repulsive LJ 
interactions and, if charged, electrostatic interactions. 

To represent an RNA molecule, we consider a linear bead-spring polyelectrolyte, with a charge of -e per bead and 
a persistence length comparable to that of ssRNA in the absence of base pairing. To focus on the effect of PSs, we 
do not consider RNA base pairing in this work; the effect of base pairing on assembly was considered in Ref. m- 
We also previously determined how assembly depends on polyelectrolyte length m- In the present work, for each 
simulated salt concentration and capsid structure, we use the polyelectrolyte length that optimizes assembly. The 
values of optimal lengths as a function of salt concentration are shown in Eig. |S8| 

Electrostatics are modeled using Debye-Hiickel (DH) interactions, where the Debye screening length (Ad) is given 

by Ad ~ 0.3/CgYit with Ad in nm Cgait the concentration of monovalent salt in molar units. Perlmutter et al. [T9| 
showed that DH interactions compare well to simulations with explicit counterions for the parameter values under 
consideration; comparisons between simulations with DH interactions and those with explicit counterions are presented 
in Refs. [m 111] and Eig. [SSj 

Simulations and units. Simulations were performed with the Brownian Dynamics algorithm of HOOMD, which 
uses the Langevin equation to evolve positions and rigid body orientations in time [68ti7Q] . Simulations were run 
using a set of fundamental units. The fundamental energy unit is selected to be = IkBT. The unit of length 
Du is set to the circumradius of a pentagonal subunit, which is taken to be IDu = 5 nm so that the dodecahedron 
inradius of 1.46Du = 7.3 nm gives an interior volume consistent with that of the smallest T=1 capsids. To calculate 
the thermodynamic optimal encapsidation length, we placed a very long polymer in or near a preassembled capsid, 
with one of the capsid subunits made permeable to the polymer and performed unbiased Brownian dynamics. Once 
the amount of packaged polymer reached equilibrium, the thermodynamic optimum length 1/*^ was measured. We 
previously [19] found that this strategy closely matched that produced using the Widom insertion method m as 
applied to growing polymer chains Haul. Assembly simulations were run at least 10 times for each set of parameters. 
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each of which were concluded at either completion, persistent malformation, or 2 x 10^ time steps. This observation 
time was chosen based on the time after which assembly yields and outcomes change only logarithmically with time 
for most parameter values. For all dynamics simulations there were 60 subunits with box size=200 x 200 x 200 nm, 
resulting in a concentration of 12/iM. 


A Subunit Side View B PS Interaction 



FIG. 1. (A) Model schematic showing components responsible for subunit-subunit interactions: subunits are bound together 
by attractor pseudoatoms (‘A’), and the Top (‘T’) and Bottom (‘B’) pseudoatoms guide the subunits towards the correct 
geometry (see SI). (B) Schematic with components responsible for attractive interaction with the RNA (drawn in red) and 
packaging site (‘PS’): positively charged ARM (‘+’) and PS Receptor (‘PSR’). The ‘Excluder’ pseudoatoms, which represent the 
excluded volume of the capsid shell, are located within the black pentagons; to aid visibility, they are not explicitly drawn here. 
Snapshots here and throughout the article are colored as follows: blue=excluders, green=attractors, yellow=ARM, red=RNA, 
orange=PS. 


Packaging sites (PSs). Structures of PSs obtained for a number of viruses through x-ray crystal structures and/or 
bioinformatics correspond to short stem loops [26]. For example, multiple short stem-loops with a single-stranded 
loop motif of A.X.X.A-, where X corresponds to any nucleotide, were identified as PSs in the STNV genome EZ]. 
An x-ray structure of STNV VLPs containing stem loop fragments revealed that they bind to well-defined sites on 
the protein ARMs with the effect of bringing multiple subunits into proximity and favoring alignments conducive to 
subunit-subunit interaction, thus enhancing subunit-subunit interactions as well as subunit-RNA interactions [37] . 

To account for the these effects, we have extended the model to include a generic representation of PS interactions 
by adding new pseudoatoms (denoted as packaging site receptors, PSRs) to the model protein subunits (Fig. [^). 
The PSRs experience short-range interactions with particular RNA segments that correspond to PSs. For simplicity, 
the PS -PSR interaction uses the same short-range Morse potential as the attractor-attractor interaction (Eq. 
section SI A). The strength of the PS-PSR attractive interaction is parameterized by the interaction well-depth eps- 
There are five PSRs per pentameric subunit. Except where noted otherwise, each PSR is located approximately 
midway between an ARM anchor segment and a subunit vertex. This location allows for a PS to simultaneously bind 
to 3 PSRs when three subunits form an optimal configuration. Thus, the PSs not only promote subunit-RNA binding, 
but also generate RNA-mediated subunit-subunit interactions, as inferred from structural data m 

Recent experiments have identified multiple PSs within several viral genomes {e.g. IS1I371I73H7S1) and that these 
PSs bind capsid proteins with a range of affinities [SaiTT]. Typically there are one or a few high affinity PSs (e.g. uM 
Ad), with the remainder having weaker affinities (up to jaM Kp) [77] . 

In our simulations, we explore how assembly depends on (A) the number of PS (Aps), (B) the PS binding affinity 
(5ps), and (C) distribution of PS binding affinities along the polyelectrolye. (A) There are 60 PSRs in a complete 
model capsid, located at 20 threefold axes. Thus, an RNA with 20 PSs can interact with every PSR, and we will refer 
to Nps = 20 as the ‘stoichiometric’ number of PSs. (B) To limit the number of model parameters, we consider two 
classes of model PSs: high affinity site s, wit h PS-PSR interaction well depth sps = 20/cbT (see Eq[^ and low affinity 
sites, with eps = bkpT (see Section SIC for discussion of binding free energies). (C) To describe how assembly 
depends on the distribution of affinities, we consider three forms of PS distributions along the model RNA: (z) Nps 
high affinity PSs, (ii) Nps low affinity PSs, and (Hi) 1 high affinity PS along with Nps low affinity PSs. In each case 
the PSs are placed at a uniform interval along the RNA. Unless otherwise noted, the strong PS in distribution in is 
placed at the center of the RNA as found for MS2 m- 
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Comparison to existing models. Dykeman et al. Eain] extended the kinetic rate equation approach of Becker 
and Boring m and Zlotnick m to include a representation of RNA and PSs. In this model the Gillespie algorithm 
[381140] is used to stochastically sample paths according to a predefined state space and matrix of inter-state transition 
rates. They assume that each of the PSRs is bound by RNA once and that subunits adding to partial capsids are 
bound to adjacent RNA segments, so that assembly must follow a Hamiltonian path. In the present model the 
spatial positions and dynamics of subunits are explicitly tracked and thus there are no assumptions made about the 
state space or assembly pathways. Consequently, there are no explicit restrictions on the sequence of RNA binding 
sites, although steric hindrances disfavor binding of multiple PSs at the same threefold axis and RNA conformational 
statistics favor returning to nearby binding sites. 


IV. RESULTS 

In this section we describe how assembly depends on the specific (PS-PSR) and nonspecific (electrostatic) in¬ 
teractions. We refer to polyelectrolytes equipped with only nonspecific interactions as non-cognate RNAs, and 
polyelectrolytes which contain one or more PSs as cognate RNAs. Note that since we neglect base pairing in this 
work, the non-cognate RNA is a linear polyelectrolyte. However, since we also neglect base pairing in the cognate 
RNA, our results are applicable to a comparison between cognate and non-cognate RNAs, except to the extent that 
the tertiary structure of cognate RNAs is more favorable for assembly than that of non-cognate RNAs [T^ [80] . 
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FIG. 2. The effect of PSs and solution conditions on assembly yields and products. (A,B)The most prevalent assembly 
product is shown as a function of ionic strength Csait and subunit-subunit attraction well-depth £ss for assembly around (A) 
a non-cognate RNA (polyelectrolyte without PS), and (B) a cognate RNA with 1 high affinity (HA, £ps = 20 A:bT) PS and 25 
low affinity (LA, eps = bkpT) PSs (B). A legend showing the outcome and a representative simulation snapshot corresponding 
to each symbol is presented in (C). (D,E) The yield of well-formed capsids assembled around (D) the non-cognate RNA or 
(E) the cognate RNA with the PS sequence as in (B). In each simulation the RNA length corresponds to the thermodynamic 
optimal length for the non-cognate at the simulated value of Csait, and ranges from 350 to 575 RNA segments 


(see Fig. S7). 
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A. The yield and specificity conferred by PSs depends on subunit-subunit and nonspecific electrostatic 

interactions 


Yield without PSs. The assembly of non-cognate RNA ^niform p olyelec trolytes) depends on the strength of 
subunit-subunit interactions (controlled by 5ss in our model, eq.j^in section [SIA{ and salt concentration or pH in vitro) 
and sequence-independent electrostatic interactions (controlled by the salt concentration Cgait)- The dependence of 
assembly outcomes on these parameters for a non-cognate RNA is summarized in Fig. High yields of well-formed 
VLPs are observed for Cgait ^ [50 — 400] mM and moderate subunit-subunit interaction strengths, 5ss G [4 — &\k^T. 
Outside of optimal parameter values, yields are suppressed by several failure modes: strong electrostatics lead to 
disordered aggregates, strong subunit-subunit interactions lead to malformed capsids, and overly weak interactions 
lead to unnucleated complexes [44]. The yield fnc{t) is defined as the fraction of simulations which, at time t, resulted 
in formation of a complete capsid (defined as 12 subunits each strongly interacting with five neighbors) completely 
encapsulating the non-cognate RNA. The yield /nc(4nd) at the simulation endpoint tend is shown in Fig.[^. For each 
case, at least 10 simulations are run, so the estimated error in the yield ranges from 0.07 — 0.13 [ST] . 

Yield with PSs. Base d on observations of multiple low affinity PSs m and simulations at varying numbers and 
strengths of PSs (section [iV B below), we performe d sim ulations at varying 5ss and Cgait foi* an RNA with 1 high 
affinity PS and Yps=25 low affinity PSs (see section [SI A[) . With the addition of PSs, the range of parameters leading 
to high assembly yields (/c) broadens considerably (Fig. ,E), allowing assembly at much lower values of 5ss across 
a wide range of Cgait and increasing the upper range of 5ss leading to assembly at low Cgait- 

Notably, assembly around the non-cognate RNA fails in distinct ways in these parameter regions (Fig. HP) , indicating 
that PSs can avoid multiple forms of thermodynamic or kinetic traps. At low 5ss, the predominant effect of PSs is to 
enhance nucleation and growth rates by increasing effective interaction strengths. Increased subunit-RNA interactions 
are mo st rele vant at high salt while increased subunit-subunit interactions are most relevant at low salt (discussed in 
section IVB). At high £ss low or moderate salt {e.g. (7sait=50mM and 5 ss=7/cbT) assembly around non-cognate 
RNA frequently leads to the nucleation of multiple partial capsids on the same RNA; typically these intermediates 
have incompatible geometries and either fail to combine or form malformed capsids. In the cognate RNA simulations, 
assembly rapidly nucleates around the HA PS; the LA PSs then enhance growth rates such that assembly is completed 
before additional partial capsids can nucleate elsewhere on the RNA. The frequency of multiple nucleation events as 
well as the structural heterogeneity of assembly intermediates are shown in SI Fig. |S4[ 

Specificity. An estimate of the specificity conferred by PSs can be obtained by comparing the assembly dynamics 
in the presence and absence of PSs. We calculated the probability that, for a given 5ss and Cgait? assembly of a 
well-formed capsid occurs around the cognate RNA before the non-cognate RNA, normalized by the probability of 
complete assembly around either substrate: 




Pi = 


fc{tend) + (!-[!- fnc{tend)V^^) “ /c(4nd)(l “ [1 “ /nc(4nd)]^"") 


( 1 ) 


where fnc{t) and fc{t) are the time-dependent yields around non-cognate and cognate RNAs (measured from the 
simulations whose final yields are shown in Figs.pb,E), and Pc{t) = is the assembly time probability distribution 
function for cognate RNAs. The parameter rex = Cnc/Cc is the ratio of non-cognate to cognate RNAs. Eq. [^for rex=l 
is shown in Eig. Hh- 

In a fairly wide range of parameter space, the assembly is 100% specific for assembly around the cognate RNA. 
However, at parameters which are optimal for assembly around non-cognate RNA (i.e. where assembly without PSs 
leads to high-yield, £ss G [4 — 6 ]/cbT, Cgait ^ [100 “ 300]mM), there is essentially no selectivity. This result highlights 
the importance of the solution conditions when assessing the role of PSs in vitro^ and may suggest an explanation for 
the varying levels of specificity for cognate RNAs observed in in vitro experiments (see the Introduction). 

We next compare this competition estimate approach to explicit competition simulations which contain a cognate 
RNA, Tex non-cognate RNAs, and 60 pentamer subunits. In these simulations, we define specificity as the fraction of 
simulations in which the first assembled capsid forms around a cognate RNA. Eigurej^ presents results for rgx = 1 
at several subunit-subunit interaction strengths and salt concentrations. Eor several parameter sets in that figure, 
assembly is not productive without PSs, and so as expected selectivity is 100%. At the other three parameter sets, 
which result in incomplete selectivity, the predicted and measured values agree to within error. While in vitro assays 
have typically focused on competition between equal concentrations of cognate and non-cognate RNAs, assembly in 
vivo can occur under a large excess of cellular RNAs [82|. The estimated specificity for rex=10 is shown in Eig. 

We have also considered competition under subunit limiting conditions — 1 cognate RNA, rex non-cognate RNAs, 
and 18 protein subunits — so that at most one complete capsid can assemble, containing either a cognate or a non¬ 
cognate RNA. As expected based on the independent assembly simulations, for parameters where assembly is not 
productive without PS (Cgait = 500mM, 5ss = 6 /cbT), we observe that assembly is 100% specific for the cognate RNA 













7 


A 


p 1:1 Competition 



^ 500 300 100 


Ionic Strength (mM) 


B 


C 


1:1 Competition 
Direction Simulation 





OQ 

^10 

c 

o 

1 8 

^ 6 
c 

=5 4 
CO ^ 


CO 


1:10 Competition 



500 300 100 

Ionic Strength (mM) 


1 

0.8 

0.6 

0.4 

0.2 

0 


FIG. 3. Selectivity for RNA containing 1 HA PS + 25 LA PS competing against a non-cognate RNA at equal concentrations 
^ex = 1, (A) estimated from the data in Fig. [fusing Eq.[^and (B) measured in direct competition simulations. (C) Selectivity 
for RNA containing 1 HA PS + 25 LA PS competing against excess non-cognate RNA, rex = 10. As in Fig|^ in each simulation 
the optimal RNA length is used based on the results in Fig.|^ In the explicit competition simulations of (B) the concentration 
of subunits is the same as used in the assembly simulations (Fig. |^. 


in all simulations for rgx ^ [1 ~ 50]. Interestingly, assembly is also 100% specific for the cognate RNA at a parameter 
set for which assembly is productive without PS (Cgait = 500mM, 5ss = 7 /cbT, rgx = !)• For these parameters, 
nucleation occurs first around the PS, which reduces the number of free subunits and impedes nucleation around the 
non-cognate. We are exploring whether this effect remains in larger systems. 

We note that the relationship between Pi and specificity in direct competition assays could break down at low 
salt, where subunits initially undergo nonspecific absorption onto cognate and non-cognate RNAs. Assembly under 
limiting subunit concentrations in these conditions requires exchange of subunits between RNAs, which occurs slowly 
relative to our simulation timescales. 


B. The effect of PSs depends on their number and strength 


We now discuss the dependence of assembly and specificity on the number and affinities of PSs. We focus on 
two interaction parameter sets: Csait=100iiiM, ess=‘^kBT and Csait=500 mM, 5 ss=6/cbT. These parameters lead to 
100% specificity for the cognate sequence considered in Fig. (1 high affinity PS and Aps=25 low affinity PSs), 
but represent very different strengths of nonspecific electrostatic interactions and correspondingly different assembly 
pathways around non-cognate RNAs. For each of these interaction parameter sets, we simulated three distributions 
of PS affinities along the model RNA (see section SI A| ): (i) Nps high affinity (HA) PSs, (ii) Nps low affinity (LA) 
PSs, and {Hi) a ‘Combo’ distribution with 1 HA PS and Nps LA PSs. Recall that the Combo sequence with Aps=25 
is considered in Figs. and 

As shown in Fig.|^ assembly yields for both interaction parameter sets are most robust under the Combo PS distri¬ 
bution. High yields are obtained for intermediate values of Aps, although the yield is optimal for sub-stoichiometric 
Nps < 20 at moderate salt and super-stoichiometric Nps > 20 at high salt (recall that there are Nps = 20 PS binding 
sites in a complete capsid). 

However, the effect of PSs on assembly mechanisms, and hence the dependence on PS distribution, is markedly 
different for the two parameter sets. At Csait=100 mM, the subunits rapidly adsorb onto the RNA. Without PSs, the 
weak subunit-subunit interactions (5 ss=2/cbT) are insufficient to drive subsequent assembly resulting in a disordered 
aggregate (Fig. [^). Nonetheless, even a sub-stoichiometric number of PSs is sufficient to promote complete assembly 
(Fig.|§\). High yields are observed for 6-8 HA PSs, 10 LA PSs, and for Nps G [10, 20] for the Combo case. For larger 
than optimal Aps, multiple partial capsids nucleate on the same RNA, leading to long-lived malformed assemblies 
that suppress yields. Snapshots illustrating typical assembly outcomes at low, stoichiometric, and excess Nps are 
shown below the plots for each salt concentration in Fig. For the Combo PS sequences, after several subunits have 
adsorbed onto the RNA, the strong PS initiates assembly, with further growth mediated by the weak PS. When there 
are multiple HA PSs, it is more likely for multiple small clusters to form, which may then merge into a single capsid, 
with the final additions driven by electrostatic interactions. 

We note that the effect of PSs under low salt and low £ss derives not from their ability to drive subunit-RNA 

























interactions, which are already strong due to nonspecific electrostatics, but rather because the locations of packaging 
site receptors (PSRs) at the capsid three-fold axes promotes subunit-subunit interactions (Fig. [^). In support of 
this conclusion, simulations in which PSRs were moved to the center of model subunits led to poor assembly (gold 
diamond symbol in Fig. |^). 

At high salt concentration (Csait=500 mM), different PS sequences promote assembly (Fig.[^). Without PSs under 
these conditions few subunits absorb on the RNA and nucleation does not occur on the timescales being simulated 
(using Markov State modeling we determined that assembly eventually occurs around the non-cognate RNA on a 
timescale which is two orders of magnitude longer [44]). With sub-stoichiometric A^ps, a cluster of subunits assembles 
in the vicinity of PSs, but subsequent growth into a capsid is slow on simulated timescales. In contrast, moderate 
to high yields are observed for super-stoichiometric PSs (A^ps ^ [30 — 40]). For the Combo sequence, the HA PS 
promotes rapid nucleation of a trimer after which the LA PSs facilitate adsorption and binding to the cluster by 
additional subunits. In contrast to the low salt conditions, super-stoichiometric HA PSs also lead to moderate yields 
of well-formed capsids; because of the weak nonspecific electrostatics malformed capsids are less prevalent. 

With high salt and relatively strong subunit-subunit interactions (5 ss=6/cbT) the ability of PSs to drive subunit- 
RNA interactions should be most relevant to promoting assembly. The effective subunit-subunit interactions promoted 
by PSs are stronger than optimal, as can be seen by the fact that reducing s^s increases yields (Fig. [^). 

Consistent with this reasoning, eliminating the contribution of PSs to subunit-subunit interactions by moving the 
PSRs to subunit centers increased the yield (gold diamond symbol in Fig. |^). 

While the location of PSRs within the capsid structure can significantly affect assembly, we found that the location 
of the strong PS along the RNA (in the Combo sequence) did not measurably alter the yield (open pentagon symbols 
in Figs.[^,B). This observation approximately agrees with Dykeman et al. [41] who predicted a very weak dependence 
on HA PS location. 


C. PSs can alter assembly pathways 

Modeling mi Ei EUHS] and experiments mi EH EH] have shown that assembly pathways around non-cognate 
RNAs can be classified according to two extremes. Systems in which protein-protein interactions dominate assemble 
through nucleation-and growth pathways with ordered intermediates, whereas strong protein-RNA interactions (low 
salt and/or high ARM charges) lead to the ‘en masse’ mechanism in which subunits rapidly adsorb on the RNA in a 
disordered manner, followed by cooperative rearrangements to form a capsid. Assembly pathways can be classified by 
the parameter nfree, defined as the number of subunits adsorbed to the RNA which are not part of the largest subunit 
cluster, averaged over system configurations for which the largest partial capsid intermediate has 4-6 subunits [44] . 
For our model capsid with 12 subunits, rifree ^ 5 indicates the en masse mechanism, with smaller values indicating 
the nucleation-and-growth mechanism. As shown in Fig. [^ assembly pathways around our model non-cognate RNA 
range the gamut of nfree, with low salt and low 5ss leading to en masse pathways and high salt and high 5ss leading 
to nucleation and growth pathways. 

The addition of PSs has a striking effect on assembly pathways. As shown in Fig.[^ assembly pathways around the 
Combo PS sequence with A^ps = 25 correspond to the nucleation-and-growth mechanism over a broad range of 5ss and 
Csait- Under most conditions the PSs increase the order of assembly intermediates (lowering Ufree, Fig. [^) because 
adsorbed subunits are co-localized and well-positioned for assembly. However, under high 5ss the most significant 
effect of PSs is to increase subunit adsorption on the RNA and thus PSs slightly increase rifree- Snapshots from 
representative trajectories for both of these cases are shown in Fig. [^ Consequently, nfree and correspondingly the 
nature of assembly pathways are less sensitive to conditions (^ss and Cgait) than for the non-cognate RNA. This result 
parallels the observation that PSs reduce the sensitivity of assembly yields to control parameters (Fig. [^. 

Relationship between predicted assembly pathways and single molecule fluorescence correlation spec¬ 
troscopy (smFCS) data. A means to test the predicted dependence of assembly pathways on solution conditions 
and PSs is provided by the fact that pathways with different values of nf^ee can be distinguished by the hydrodynamic 
radii (Rh) of their early intermediates [44]. Recent experiments have used smFCS to monitor the timecourses of Rp 
during assembly around cognate and non-cognate RNAs [36l [89[ [90] . Under the experimental conditions, assembly 
around cognate RNAs was rapid and characterized by either constant Rp or a collapsed complex followed by grad¬ 
ual increase to the size of an assembled capsid. Assembly around non-cognate RNAs was slower, with Rp initially 
increasing before finally decreasing to the size of the capsid. 

To relate the predicted effect of PS on assembly pathways (rifree) discussed above to an experimentally observable 
quantity, we estimated the hydrodynamic radii Rp for polymer-subunit intermediates using the program HYDROPRO, 
which has been shown to accurately predict Rp for large protein and protein-NA complexes [91]. Fig. shows 
calculated Rp for assembly around cognate and non-cognate RNAs for Csa\t=^00mM and several values of 5ss- For 
weak subunit-subunit interactions (^ss = 2 /cbT, Fig. [^), assembly without PSs results in disordered aggregates 
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Disordered 


Complete Malformed 


B ^ss ^salt 500 mM 




Unnucleated Incomplete Complete 


FIG. 4. Yield as a function of number of PS, A^ps, at low (A) and high (B) salt concentrations. Note that for these parameters 
yield is zero in the absence of PSs. PSs are either all LA (■), all high affinity (• symbols), or the Combo sequence with 1 HA 
and iVps LA PSs(A symbols). For these cases, the HA PS is placed in the center of the RNA. Results from sets of simulations 
with the HA PS placed in the terminal position are shown as O symbols. The result from simulations with the PS binding site 
placed in the center of the subunits is shown as a ♦ symbol. Note that there are 20 PS binding sites in a complete capsid, so 
Aps = 20 is the stoichiometric value. Snapshots illustrate the trend in dominant outcomes with increasing PS number. 


(Pig.§, and the monotonically increases over time. With PSs, on the other hand, the Rn initially increases as 
subunits attach to the RNA, and then rapidly decreases as the capsid assembles. Upon increasing the subunit-subunit 
interactions strength (^ss = 3 /cbT, Fig. [^), successful assembly occurs with and without PS; however, the increase 
in i?H is greater and of longer duration in the absence of PS. This difference occurs because the PSs enhance the 
assembly rate and decrease rifree- Finally, under stronger subunit-subunit interactions (^ss = 4 /cbF) PSs have little 
effect on nfree (Fig. and correspondingly the time course of Ru (Fig. [^) is similar for cognate and non-cognate 
RNAs. 

The results at low subunit-subunit interaction strength resemble some key features of the experimental observations 
of i?H around cognate and non-cognate RNAs, while the lack of effect of PSs on assembly pathways under larger 5ss 
emphasizes the fact that specificity depends on the underlying assembly driving forces. Importantly, the ability to 
change Rn time courses does not depend on specific geometric features of the PSs in our simulations, but only requires 
that PSs promote rapid, ordered assembly pathways. Increasing the subunit-subunit interactions can achieve a similar 
effect as adding PSs in this regard (Fig. IT)- 

In the simulations discussed thus far (Fig.[^-C), we used a relatively short RNA (575 segments at Cgait = lOOmM), 
since this is the optimal length for our subunits with the small ARM charge (+5) [HI [44]. Therefore, the Ru of the free 
RNA (prior to encapsidation) is less than that of the assembled capsid and the measured Rn increases substantially 
upon adsorption of subunits, until assembly of ordered partial capsids reduces Ru- To examine the applicability of 
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FIG. 5. The assembly pathway order parameter nfree measured from simulations for (left) the non-cognate RNA and (center) 
the cognate RNA with the Combo PS sequence, A^ps = 25. (Right) The change in nfree due to PSs. 



£55=3kgT, l = 100mM 


Non-cognate 


Cognate 


£ss=7kBT, l=500mM 


Non-cognate 


Cognate 


FIG. 6. Snapshots from typical assembly trajectories without and with PSs (the cognate here is the combo sequence with 1 
HA and 25 LA PSs) for low and high salt concentrations. PSs are depicted as large orange spheres. 


our findings to the more typical case in which the free RNA is similar to or larger than the capsid size, we also 
performed simulations on subunits with ARMs with charge (+10) and optimal RNA length 910 segments (Fig. [7^-F). 
The behavior of this system is qualitatively similar to that of the +5 ARMs, except that the initial increase in 
upon subunit absorption is less apparent for ordered assembly pathways. Note that in previous simulations m we 
found that including base pairing did not qualitatively change Rn time courses. However, we note that we do not 
observe an increase in Rn after the initial decrease, as observed in some of the experiments [36l |90]. This pattern 
might reflect a global RNA conformation change triggered by subunit binding, which is not currently considered by 
our model. 
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FIG. 7. (A-F) Radius of hydration Rh as a function of simulation time steps for assembly trajectories performed at indicated 
parameter values, for non-cognate RNA (■ symbols) and cognate RNA (• symbols). The Rh values before subunits are 
introduced are shown as A symbols. The subunit-subunit interaction energy £ss increases from left to right. In the top row 
(A-C), the subunit ARM charge is (+5), and the RNA length is 575 segments; in the second row (D-F), the subunit ARM 
charge is (+10), and the RNA length is 910 segments. (G) Snapshots from simulations corresponding to panel (E), with 
non-cognate RNA on the left and cognate RNA on the right. 


D. Restricted RNA Conformational Dynamics Inhibit Assembly 

Although our simulations show that PSs can promote efficient assembly under conditions in which nonspecific 
assembly fails, we also found that non-optimal numbers and affinities of PSs can lead to kinetic traps. In this 
section we discuss mechanisms that can lead to these kinetic traps. As seen in Fig. RNAs with 20 HA PSs (the 
stoichiometric amount) fails to produce capsids at high or low salt. In part this outcome follows the well-known 
rule that strong interactions hinder self-assembly into highly ordered low free energy configurations by preventing 
TocaF equilibration among partially assembled configurations and thus trapping the system in metastable disordered 
states m sa sg i53i iMi isa insii] , as found for strong nonspecific electrostatics (Cgait < 10 mM) or subunit-subunit 
interactions (^ss > SksT, Fig. Q. However, a more nuanced explanation is required, since at high salt adding more 
HA PSs (Nps = [30,40],5ps = 20 /cbT) does lead to moderate assembly yields (Fig. [^. Analysis of the high salt, 
^ps = 20 simulation trajectories revealed ordered partially assembled capsid intermediates which, despite having 
optimal subunit-subunit interaction geometries, failed to reach completion (Fig. [^. 

We hypothesized that the stalled assemblies result from poorly equilibrated RNA conformational dynamics within 
assembling capsids. To investigate the dependence of the RNA dynamics on PS sequences, we measured the fluctua¬ 
tions of the paths traced by RNAs within capsids (or capsid intermediates) over the course of dynamical trajectories. 
To simplify the analysis we define an RNA path as the sequence of capsid threefold sites with which RNA segments 



















12 


strongly interact (counting by the index of each strongly interacting RNA segment, Fig. |^). Because the ARM 
anchoring sites and the PSRs are located near threefold sites, these sites have enhanced non-cognate RNA densities 
both in the presence and absence of PSs [19]. We note that these paths can be complex; for example, with strong 
electrostatics (Cgait = lOOmM), ^ 50% of paths re-visit one or more threefold sites (meaning segments which are 
nonlocal in sequence interact with the same threefold site), compared with only ^ 10% at (Cgait = 500mM). Similarly, 
jumps between non-neighboring vertices are also common, present in ^ 1/3 of paths. Thus, the RNA usually does 
not trace a Hamiltonian path within the capsid in our simulations. 

In Figure]^ we quantify the RNA path persistence time (time required to change conformation) within assembled 
capsids. In Fig.|^, we see that non-cognate RNAs and RNAs with the Combo PS sequences (primarily LA PSs) are 
highly dynamic; a new RNA path is observed at almost every observation time. In contrast, an RNA with 20 HA 
PSs is far less dynamic, with existing paths persisting for far longer periods of time. To help visualize the differences 
between these two classes of dynamics, schematics of pathways observed at five different observation times for different 
RNA sequences are shown in Fig. [^. While the paths for the 20 HA PS sequence are nearly identical, the paths 
change significantly on this timescale for the other two PS sequences. We find that assembly stalls when the RNA 
becomes frozen in a conformation whose geometry hinders recruitment of additional subunits to the assembling capsid. 
Two examples of such conformations are shown in Fig. ^p. Under parameters which promote RNA dynamics, such 
conformations are transient. 

Notably, as the number of PSs is increased beyond the stoichiometric value A^ps = 20 for any distribution, the 
RNA dynamics increases (Fig. [S^. This observation explains the increase in yield for the HA PS distribution at high 
A^PS- This facilitation arises because excess PSs can displace bound PSs from a given three-fold site without requiring 
complete l oss of interaction at that site, thus avoiding the activation barrier associated with PS-PSR unbinding (see 
SI section SH B| and Fig. S6). In contrast, for A^ps < 20 exchange requires dissolution of a PS-PSR interaction. In 
effect, RNA with excess PSs can slide on capsid intermediates to sample different conformations and thus escape from 
unproductive traps. Previous simulations [42l[43] and a theoretical model [95] have suggested the importance of RNA 
rearrangements and subunit ‘sliding’ during assembly around non-cognate polyelectrolytes . 



Non-Cognate 


Combo 
10 PS 


Strong 
20 PS 



FIG. 8. (A) Rate of path discovery during a dynamic trajectory for RNAs within a preassembled capsid. (B) Schematic 
representation of RNA path within the capsid at intervals of 5 x 10^ timesteps. Line indicates path of RNA, with line color and 
width changing gradually with contour length for clarity. (C) Snapshots and schematics indicating non-optimal RNA paths 
which lead to stalled assemblies. PSs are shown with exaggerated size to improve visibility. Segments of interest are shown in 
green. 


V. DISCUSSION 

In this article, we have described simulations of capsid assembly around RNA, represented as a flexible polyelec¬ 
trolyte with sequence-specific protein-RNA interactions, or packaging sites (PSs). By performing extensive simulations 
over a range of ionic strengths, simulated protein-protein interaction strengths, and strength and number of PSs, we 
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have explored how PSs alter the pathways and products of capsid assembly reactions, and the extent to which they 
induce specificity against polyelectrolytes without sequence-specific interactions (e.g. non-cognate RNAs). We find 
that PSs can confer arbitrarily high specificity over RNAs with uniform nonspecific interactions, but that the degree 
of specificity is sensitive to the underlying assembly driving forces, which can be tuned by solution conditions (ionic 
strength, pH) as well as capsid protein charge (ARM sequence). The best specificity is conferred under conditions 
where the nonspecific interactions alone are slightly too weak to promote effective assembly. 

Assembly and specificity are also sensitive to the affinity and number of PSs, with the optimal distribution of 
PSs depending on the solution conditions. Our simulations suggest that the PS sequences that confer the highest 
specificity and are most robust to solution conditions contain one or a few high affinity PSs and a stoichiometric 
or small excess of low affinity PSs. This observation is consistent with recent models m, observation of multiple 
weak PSs in viral genomes m and recent in vitro measurements m- Our simulations identify multiple mechanisms 
by which PSs can confer specificity, depending on the protein sequence and solution conditions. Under conditions 
where protein subunit-subunit or sequence-independent protein-RNA interactions are too weak to nucleate assembly, 
PSs that enhance protein-RNA interactions and RN A-mediated protein-protein interactions can induce nucleation 
and facilitate subsequent assembly. Under conditions where strong subunit-subunit and nonspecific subunit-RNA 
interactions lead to multiple, geometrically incompatible partial capsids forming on individual RNAs, efficient and 
specific assembly can be realized by PS sequences that favor nucleation and rapid assembly of a single partial capsid. 
Finally, the simulations demonstrate that PSs can dramatically alter assembly pathways in comparison to non-cognate 
RNAs (Fig.[^, as observed in recent experiments [36], but that the effect on assembly pathways is sensitive to solution 
conditions. 


Our simulation results suggest that rapid, specific assembly can proceed via a diverse ensemble of pathways, provided 
that RNA conformations can anneal during assembly through reversible interactions and/or cooperative RNA-protein 
rearrangements. This finding is consistent with the observation that proteins can fold by multiple, dissimilar pathways 
[96] . In particular, the RNA does not trace a Hamiltonian path, as has been inferred from structures of T=3 MS2 
capsids [33] and assumed in other models Eain]. However, the expectation of a Hamiltonian path relies on coupling 
between PS binding and subunit conformation, which is not present in our model of a T=1 capsid. 

Dykeman et al. [22] recently showed that the gradually increasing protein concentration characteristic of an MS2- 
infected E. coli can increase specificity for a cognate RNA in comparison to assembly under a fixed protein concen¬ 
tration. Enhanced specificity arises in their model because during the initial stages of the reaction nucleation occurs 
only around cognate RNAs, similar to the behavior in our simulations for high salt and low 5ss (Eig. |^. When the 
protein concentration increases during later stages it is rapidly consumed by growth of the partial capsid-cognate 
RNA complexes. While simulations with time-varying protein concentrations are beyond the scope of the present 
work, we anticipate that similar specificity enhancements would arise in our model. 


Implications for experiments. Our simulations predict that the specificity conferred by PSs is sensitive to 
parameters that control the pathways and efficiency of sequence-independent assembly. While it has been previously 
suggested that the degree of specificity observed in in vitro experiments is sensitive to subunit concentration, the 
predicted phase diagrams reveal that varying ionic strength, pH, or protein-RNA binding sequences (through muta¬ 
genesis) could shift an experiment from selective to nonselective. This result may shed light on the varied degrees 
of specificity observed in previous competition experiments [m EHl mi 13 ED- However, note that the location of 
boundaries within the predicted phase diagrams depend on the protein-RNA binding sequence|44|. Eurthermore, ex¬ 
perimentally measured specificity has been defined in different ways, depending on whether capsid protein is in excess 
or limiting. Our simulations find that these two conditions and definitions can lea d to si milar or different observed 
levels of specificity, depending on the location and parameter space (see section IVA). An intriguing prediction 


from our simulations is that excess PSs (in comparison to the number of binding sites within a complete capsid) can 
increase assembly under some conditions by promoting exchange of improperly bound PSs. This prediction, as well 
as the general dependencies on the affinity and number of PSs, could be tested by constructing RNA fragments with 
varying numbers of high- and low-affinity PSs. 
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Supporting Information 


SI. MODEL AND SIMULATION DETAILS 
A. Model details 

In our model, all potentials can be decomposed into pairwise interactions. Potentials involving capsid subunits 
further decompose into pairwise interactions between their constituent building blocks - the excluders, attractors, 
‘Top’ and ‘Bottom’, and ARM pseudoatoms. It is convenient to state total energy of the system as the sum of 6 terms: 
a capsid subunit - subunit Uss part (which does not include interactions between ARM pseudoatoms), subunit-ARM 
Usa, polymer-polymer (i.e. RNA-RNA) Upp, ARM-ARM Uaa, polymer-ARM I/pa, and subunit-polymer I/gp (which 
includes the PS-PSR interactions), each summed over all pairs of the appropriate type: 

u = E E U.S+Y1 E E ^pp+ E E 

sub i sub j<i sub i ARM j poly i poly j<i ARM i ARM j<i 

+ E E ^p®- E E ^®p 

poly i ARM j sub i poly j 

where J^sub i X^sub j<i distinct pairs of capsid subunits in the system, ^ ^poiy j 

over all subunit-polymer pairs, etc. Note that unless otherwise stated, polyelectrolyte segments and polymer segments 
designated as PS have the same interactions and parameters. This rule is excepted for PS-PS interactions and PS-PSR 
interactions, as described below. 

The capsid subunit-subunit potential Uss is the sum of the attractive interactions between complementary attractors, 
and geometry guiding repulsive interactions between ‘Top’ - ‘Top’ pairs and ‘Top’ - ‘Bottom’ pairs. There are no 
interactions between members of the same rigid body, but ARMs are not rigid and thus there are intra-subunit ARM- 
ARM interactions. Thus, for notational clarity, we index rigid bodies and non-rigid pseudoatoms in Roman, while the 
pseudoatoms comprising a particular rigid body are indexed in Greek. For subunit i we denote its attractor positions 
as with the set comprising all attractors a, its ‘Top’ positions and its ‘Bottom’ positions {b^c^}. The 

capsid subunit-subunit interaction potential between two subunits i and j is then defined as: 


Nt 

Uoc{{ 5 {tio: }5 {bio; }> {aj/?}, {tjp}, {b^T?}) = E^'^ (I* iOL tjl‘/3| 5 ^t) 

a,(3 
A^b,iVt 

"b ^ ^ (jb^Q; 5 <^b) 

a,13 
iVa 

^ia ^j/3 I 5 '^0 5 '^cut) 

a,(3 

( 3 ) 


where 5 is an adjustable parameter which both sets the strength of the capsid subunit-subunit attraction at each 
attractor site and scales the repulsive interactions which enforce the dodecahedral geometry, At, Ab, and Aa are 
the number of ‘Top’, ‘Bottom’, and attractors pseudoatoms respectively in one subunit, at and ab are the effective 
diameters of the ‘Top’ - ‘Top’ interaction and ‘Bottom’ - ‘Top’ interaction, which are set to 10.5 nm and 9 nm, 
respectively, vq is the minimum energy attractor distance, set to I nm, p is a parameter determining the width of the 
attractive interaction, set to 2.5, and rcut is the cutoff distance for the attractor potential set to 10 nm. 

The function C is defined as the repulsive component of the Lennard-Jones potential shifted to zero at the interaction 
diameter: 


C{x^ a 



0 


: X < a 

: otherwise 


( 4 ) 


The function AI is a Morse potential: 


M(x,ro,g) = 



0 


Tshift ('^cut) • ^ ^ '^cut 

: otherwise 


( 5 ) 
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with V^hift(^cut) the value of the unshifted potential at rent- 

The capsid subunit-ARM interaction is composed of a short-range repulsion representing the excluded volume. For 
subunit i with excluder positions and ARM segment j with position Rj, the potential is: 

iVx 

^sad^icn} 5 ~ ^ ^ ^ (l^icn \ 5 <^xa) (b) 

a 

<^xa = 0.5(crx TcTa) is the effective diameter of the excluder-ARM repulsion with da = 0.5 nm the diameter of an ARM 
bead. 

The interactions between polymer segments are defined as follows. Polymer segments which occupy adjacent 
positions within a polymer chain experience only a harmonic potential /Cbond which depends on bond distance. The 
polymer-polymer non-bonded interaction is composed of electrostatic repulsion and short-ranged excluded volume 
interactions, as well as an additional longer-range repulsion between PS segments added to prevent multiple PS 
segments from occupying a single binding site. The potential between two polymer segments i and j is given by 


5 r^/c) — 


f ^bond(-^ijf 5 <^p: ^bond) • } bondcd 

y , (jp) + Xps(i)Xps(j)i 2 (R 2 jf 5 <^ps) T 5 ^p5 ^p5 <^p) • nonbonded 


where Rij = |Ri — Rj| is the center-to-center distance between the polymer segments and dp = 0.5nm is the diameter 
of a polymer segment. The harmonic bond pontential between sequential segments given by 


A'bond(-^ij 5 ^bond) — 


^bond 




( 8 ) 


The function Xps{k) = 1 if segment i is a packaging site and 0 otherwise. This interaction term accounts for steric 
interactions which inhibit multiple packaging sites from interacting with the same site on a capsid protein by adding 
an additional repulsive interaction with effective diameter dps = 3nm. The final term in Eq. is a Debye-Hiickel 
potential accounting for screened electrostatic interactions between polymer segments with valence charge = — 1, 
given by 


: r < 2 Ad 


: otherwise 

with Ad as the Debye length, /b as the Bjerrum length, qi and q 2 as the valences of the interacting charges, and the 
potential is smoothly switched to zero at the cutoff distance r=3AD. 

The ARM-ARM interaction is similar to the polymer-polymer interaction, consisting of non-bonded interactions 
composed of electrostatic repulsions and short-ranged excluded volume interactions, as well as bonded interactions 
between sequential monomers in the ARM chain: 


^ : 2Ad < r < 3 Ad 


^dh(^, ^1, ^2, cr)/^BF = < 


gi<?2EAD 
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J) 9 ig 2 ibAD 
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TT • P? _ / ^bond 5 ^bond) • Ibj } bondcd 

~ I kl{Rij,(Js) : {i,j} nonbonded ^ 

where Rij = jR^ — Rj| is the center-to-center distance between the ARM subunits and qi is the valence of charge on 
ARM segment i. For the simulations described in this work, = — 1 for all ARM subunits. 

The ARM-Polymer interaction is the sum of repulsive, short-ranged excluded volume interactions and electrostatic 
interactions: 


b^pa5 5 ^ap <^ap ) (11) 

with (Tap = 0.5nm. 

The capsid subunit-polymer interaction is a short-ranged repulsion representing the excluded volume, with an 
additional attractive interaction between the packaging site receptor (PSR) on the subunit and polymer segments 
which are packaging sites (PS). For capsid subunit i with excluder positions {x^q,} and PSR {c^q,} and polymer 
segment j with position Rj, the potential is: 

b^spd^ia} 5 ~ ^ ^ kL (jx^Q/ Rjf |, CTxp) + Xps(j) ^ ^ £:psA^ (Ic^a | : Qi '^cut) 


a 
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( 12 ) 
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where is the number of excluders on a capsid subunit, Nc is the number of PSRs on a subunit, dxp = 0.5(crx + cTp) 
is the effective diameter of the excluder-polymer repulsion with = 3 nm and dp = 0.5 nm the respective diameters 
of excluder and polymer beads. The Morse parameters used for the PS-PSR interaction are the same as used in the 
the attractor-attractor interaction, except for eps which is set at 5 or 20 /cbT, as discussed in the main text. Note 
that the second term of equation 12 only applies to the polymer segments (j) which are PSs. 


B. Simulations 

Trajectories are simulated using the Brownian Dynamics algorithm of HOOMD, which uses the Langevin equation 
to calculate the time evolution of positions and rigid body orientations [ 68 ]. For each of our dynamical assembly 
simulations, the box size = 200 x 200 x 200 nm. Except where mentioned otherwise, the box contained 60 subunits, 
resulting in subunit concentration = 12 /iM. Assembly simulations were run at least 10 times for each parameter set, 
and each were concluded at tend = 2 x 10^ time steps. For each pseudoatom, 7 was assigned as its effective interaction 
diameter. VMD was used to visualize the model conformations [97]. We calculated the hydrodynamic radius Rp 
using HYDROPRO [91] as discussed in our previous work [44]. 
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C. Binding free energy estimates 

We have previously calculated the free energy of subunit dimerization to be ^ss/^b^= 5.0 — l.S^ss [l9l|44]. Briefly, 
simulations were set up with subunits which were limited to dimer formation, and the concentration of dimers was 
measured for varying 5ss- The free energy of binding along that interface is then ^ss/feT = — ln(css/i^d) with standard 
state concentration Cgg = 1 M and in molar units, and adjusted for the multiplicity of dimer conformations. The 
reduction in due to electrostatic repulsion between ARMs is ^ 0.5 /cbT at Cgait = lOOmM. 

We follow a similar strategy to calculate the binding free energy of the PS-PSR interaction. Here, we set up a 
simplified system containing a single trimeric PSR, composed of the PSR from three subunits, as well as the ARM 
and excluder pseudoatoms. The subunits (excepting the ARMs) are immobilized to prevent dissassembly. We then 
measured the relative concentration of PS bound and unbound states for a range of attraction strengths (^ps). The free 
energy of binding along that interface is then ^ps/^bT = — ln(css/A'd) with standard state concentration Cgg = 1 M 
and in molar units. At Cgait = lOOmM, the free energy is well fit by the linear expression ^ss/feT = —l.S^ss — Ts\y 
where Tsb = 2.3kBT (Fig.[S^). By this estimate our LA PS (^ps = 5 /cbT) has ^ gM. Note that this is the 
free energy of the PS binding to a complete, trimeric binding site; however, in many of our assembly simulations, the 
subunits do not form stable trimers without the PS. Therefore the PS-PSR interactions that occur during assembly 
often involve only one or two PSRs and non-optimal geometries. 

Fig. [ST^ shows data from two related sets of simulations in which we measure the fraction of time a PS binds to 
PSRs in systems which contain a full length polyelectrolyte (575 segments at Cgait = lOOmM) and 12 subunits. In 
one case the 12 subunits are assembled into a complete capsid while in the other the subunits are adsorbed onto the 
polyelectrolyte but unassembled (^ss was set to zero)., either as a completed capsid or unassembled, adsorbed subunits. 
Interestingly, while both curves are sigmoidal in shape, at binding strengths comparable to our LA PS (^ps = 5 /cbT) 
PSs within complete capsids are nearly always bound to PSRs whereas they spend less than half their time bound to 
unassembled subunits. This difference arises because PSRs have ideal geometries within a capsid but are disordered 
in the unassembled subunits. Note that the binding probabilities measured in these simulations reflect a partitioning 
of PSs between specific binding to PSRs and nonspecific binding to subunit ARMs through electrostatics, whereas 
the free energies calculated in Fig. reflect the partitioning between PSRs and solution. 




^ps ^ps 


FIG. SI. (A) Free energy of PS binding to a single complete, trimeric PSR binding site, as a function of PS-PRS interaction 
well depth £ps. The symbols indicate measured data points and the line shows a linear fit. (B) Fraction of time a PS is 
bound to PSRs as a function of £ps, measured within an assembled capsid (■ symbols) and in the presence of 12 adsorbed but 
unassembled subunits (• symbols). The polymer has 575 segments with one PS and Csait = lOOmM. 
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SII. ADDITIONAL RESULTS AND ANALYSIS 


A. Effect of a single PS on assembly dynamics and specificity 


The fact that a single high affinity PS within a viral genome could promote specificity by functioning as a nucleation 
site has been considered for several decades {e.g. [3T])and the specificity conferred by a single PS was examined by in 
vitro experiments in which two species of heterologous RNA competed, one of which contained a single high affinity 
PS [31]. These experiments identified modest selectivity 2/3) for the RNA with one PS. To evaluate our results 
in the context of that experiment and to more broadly understand the limits of selectivity conferred by a single PS, 
we present simulations comparing non-cognate assembly with RNA containing a single HA PS located in the center 
of the polymer (as is the case for rl7/MS2 [31]). We consider assembly at two salt concentrations (/ = 100, 500mM) 
in order to describe the effect of a PS under strong and weak electrostatic interactions respectively. For each salt 
concentration, we consider a range of subunit-subunit attraction strengths (^ss/^bF G [2,5] for Cgait = lOOmM and 
{^ss/^bT = 6, 7 for Csait = 500mM) over which assembly yields in the presence of the non-cognate RNA alone vary 
from zero to high (see Fig. [^. In the limit of low 5ss we observe disordered aggregates (Cgait = lOOmM, ^ss = 2 /cbT) 
or failure to nucleate (Cgait = 500mM, 5ss = 6 /cbT) around the non-cognate RNA. The setup for these simulations is 
the same as for simulations presented in the main text. 

In Fig. [^the average size of the largest cluster of subunits is plotted as a function of time. For the parameters 
in which the non-cognate RNA triggers rapid assembly (Cgait = lOOmM, 5ss = 4, 5 /cbT) incorporation of the PS does 
not substantially alter the time course of assembly. For cases in which assembly around the non-cognate RNA is slow 
(^salt ~ 10077T/A7, £ss — 3 A^b 4 and — 50077T/.A4^, s^gg — 7 kB 4') 5 the F^S increases the assembly rate although long¬ 

time yields are similar. For cases in which the non-cognate RNA leads to no assembly on the investigated timescale 
(C'gait = lOOmM, 5 gg = 2 /cbT and Cgait = 500mM, 5 gg = 6 /cbT), the presence of a single PS increases the extent of 
assembly, but complete capsids do not form in the timescale considered here. As seen in Fig. [^ additional PS are 
needed to induce efficient assembly at these parameters. 

Figure [S^ shows the assembly yields around a single RNA with and without the PS for Cgait = lOOmM. 20 
simulations were run for each parameter value. As discussed above, for Cgait = lOOmM, 5gg = 5 /cbT assembly is 
robust without PS, and the effect of the PS on assembly is slight, while at 5gg = 3,4 /cbF the increase in yield due 
to the PS is moderate. At 5gg = 2 /cbF the presence of a single PS is not enough to allow for successful assembly 
and disordered aggregates are observed. In Figure S2H we compare the selectivity predicted by Eq. (using the 
data from Fig. [S^-G) against the results of explicit competition simulations. In these competition simulations, 
there are 60 subunits, one RNA containing a HA PS and one non-cognate RNA (thus rex = !)• We find that the 
predicted selectivities match those from the explicit competition simulations quite closely, especially considering the 
limited number of independent simulations. Moreover, the finding that about 2/3 of the assembled capsids contain the 
PS-RNA roughly agrees with the experimental observations of Beckett et al. m, in which incorporation of a single 
high affinity PS led to ^ 2/3 selectivity under conditions where assembly around heterologous RNA was efficient. 
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FIG. S2. (A-F) Average cluster size as a function of simulation timestep with and without a single HA PS for varying salt 
concentration and subunit-subunit attraction. These results are the average of ten independent assembly simulations, each run 
using a single substrate and subunits at a concentration of 12 /xM. (G) Yield of complete capsids from assembly simulations 
around a single non-cognate RNA or an RNA containing one HA PS, at several values of Sss and Csait = lOOmM. (H) 
Comparison between specificity observed in explicit competition simulations and predicted by Eq. eqiPl. The competition 
simulations each contained one non-cognate RNA, one cognate RNA containing one HA PS, and 60 subunits. The predictions 
use data from (A-G). 


B. Capsid and RNA subunit paths and conformational dynamics 


It has been proposed that PSs increase assembly rates by reducing the diversity of assembly pathways; essentially 
eliminating ‘dead end’ pathways. To characterize the relationship between PSs and the ensemble of assembly pathways 
generated by our model, we evaluated the effects of PSs on the diversity of assembly intermediates, both from the 
perspective of arrangements of subunits in interacting clusters and RNA conformations. 
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1. Partial capsid intermediate geometries 

We used two approaches to characterize the diversity of capsid intermediate geometries generated during assembly 
trajectories. In the first, we calculated the fraction of lowest-energy intermediates (a lowest energy intermediate 
contains the maximum possible number of subunit-subunit interactions for its size) for clusters with 3 — 11 subunits. 
In the second, we categorized system configurations according to: the number of clusters, the number of subunits in 
each cluster, and the number of bonds in each cluster. We then calculated the entropy S of the distribution according 
to S = — py log py with py thc relative probability of configuration v. 




Number of PS, Nps Number of PS, Nps 


FIG. S3. The fraction of assembly intermediates which contain the maximum number of subunit-subunit bonds ((A,C,E,G) 
) and the entropy of the distribution of intermediate configurations ((B,D,F,H) ). Results are shown as a function of £ss and 
Csait for a non-cognate RNA (A,B) and the Combo PS sequence (C,D) . Results are shown as a function of the number of PSs 
Aps, with £ss = 2 /cbT and Csait = lOOmM (E,F) and £ss = GfeT and Csait = 500mM (G,H) . In (E-H) the RNA contained 
either Aps HA PSs (• symbols) or 1 HA PS and Aps LA PSs (A symbols). Results for the non-cognate RNA (no PSs) are 
indicated by ■ symbols. 

Fig.[S3lA,B shows the results for non-cognate RNA assembly, as a function of salt concentration and subunit-subunit 
interaction. We see that weak subunit-subunit interactions and high salt concentrations lead to a high fraction of 
optimal intermediate geometries and consequently a low entropy of the distribution of intermediate configurations. 
As 5ss increases or Cgait decreases the distribution of intermediate configurations widens, indicating more diverse 
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assembly pathways. The results are consistent with a previous study of uniform polyelectrolytes [44], in which we 
found that high salt and moderate subunit-subunit interactions lead to an ‘ordered’ nucleation-and-growth assembly 
which proceeds through a series of well-formed intermediates. Stronger interactions (either electrostatic or subunit- 
subunit) tend to promote the formation of multiple clusters and stabilize non-optimal configurations. 

The results with the Combo PS sequence (Fig. [S^,D) show that PSs tend to increase the diversity of intermediate 
configurations and prevalence of non-optimal cluster geometries. The effect is particularly noticeable under conditions 
where the underlying interactions are weak (high salt, low e^s)- Although the results as a function of varying A^ps 
(Fig.|S3p-H) are noisy, we generally observe an increase in the width of the distribution and more non-optimal clusters 
as more PSs are added, and the HA PSs have a stronger effect than the LA PSs. 

In contrast to the general observation that adding PSs leads to a wider diversity of intermediate geometries, we 
find that PSs narrow the distribution of intermediate geometries under parameters where electrostatics and subunit- 
subunit interactions a re b oth strong {e.g. Cgait = 50mM,£ss = T/cpT). This effect is significant; as discussed in 
the main text (section IV), PSs increase assembly yields in this regime even though the nonspecific interactions are 
sufficiently strong to promote assembly (Fig. [^. To clarify the mechanism by which PSs influence pathways in this 
regime, we separately characterized their effect on intermediate geometries and distribution of intermediates. In 
particular, we calculate the average deviation from the ground state for clusters with n subunits, given by [65] 


{AB{n)){n) = [B^%n) - B{c)]),. (13) 

where B{c) is the number of subunit-subunit interactions for configuration c, n{c) is the number of subunits in 
configuration c, B^^{n) is the number of bonds in the minimum-energy configuration of n subunits, and the average 
is taken over all configurations weighted by their frequency of appearance in assembly trajectories. The quantity 
{AB{n)) is shown as a function of intermediate size in Fig.[S^ for a non-cognate RNA and the Combo PS sequence. 
We see that the PSs increase the frequency of deviations from the ground state cluster, consistent with other regions 
of parameter space. However, the average number of clusters (Fig. |S4p) significantly decreases in the presence of 
PSs. We find that the HA PS in the Combo sequence promotes rapid nucleation of a partial capsid, which tends to 
complete assembly before other nucleation events occur. In contrast, multiple nucleation events are common on the 
uniform polyelectrolytes. As noted in the main text, avoiding multiple nucleation events appears to be the mechanism 
by which PSs increase yields and confer specificity in this regime of relatively strong nonspecific interactions. 



FIG. S4. bf (A) Average deviation in number of interactions {AB{n)) from the ground state configuration as a function of 
intermediate size n, averaged over assembly trajectories with a uniform polyelectrolyte (• symbols) or the Combo PS sequence 
(■ symbols). (B) Number of clusters as a function of intermediate size n averaged over assembly trajectories. Simulation 
parameters were Csait = 50mM, Sss = 7 /cbT, with the cognate RNA containing the Combo PS sequence 1 HA PS + 25 LA PS. 


2. RNA conformational statistics and dynamics 

To simplify characterization of the ensemble of RNA conformations and its dynamics, we categorize RNA con¬ 
formations according to the sequence of capsid vertices with which the substrate interacts. This definition neglects 
fluctuations of RNA segments between vertices, and thus most closely reflects RNA conformations in simulations with 
high salt, large 5ps, and Nps ~ 20, for which the PS-PSR binding is the dominant interaction. Under parameters 
with fewer PSs and stronger electrostatics, although we observe enhanced polyelectrolyte density in the vicinity of 
capsid vertices m, this definition does not completely reflect the diversity of RNA configurations. 
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A 


Preassembled Capsid Examples 




FIG. S5. Examples showing RNA path exploration within preassembled capsids (A) and for assembly simulations (B). At each 
step we determine the polymer path, and if unique assign it a new index. For the Non-cognate and ‘1 HA + 10 LA PS’ cases, 
a unique path occurs at nearly every frame, whereas the ‘20 HA PS’ case is limited to a very small set of paths. We observe 
that this restriction in RNA dynamics corresponds with stalled assembly. These simulations are run at Csait = lOOmM, and 
Sss = 5/cbT for the non-cognate or ^ss = 2 /cbT for the PS containing polymers. 


Fig. shows the frequency with which new RNA conformations (according to the above definition) are gener¬ 
ated during simulations of RNA within a completely assembled capsid (Fig. |S5|A) and within an assembling capsid 
(Fig. |S5p). For the non-cognate and LA PS cognate sequences, the RNA adopts a new path at almost every window, 
while the RNA with 20 HA PSs undergoes restricted dynamics, rarely transitioning to a new conformation. Com¬ 
parison of Fig. and Fig. suggests that frozen RNA dynamics tend to give rise to stalled assembly trajectories; 
indeed, assembly stalls at 10 subunits for the 20 HA PS sequence (Fig. S5B upper frame). These simulations are run 
at Csait = lOOmM. 

Previous simulations suggested that polymer-mediated capsid assembly is enhanced by cooperative polymer-subunit 
motions and by ‘sliding’ of adsorbed subunits along the polymer [42l |43l |95] . To evaluate the extent to which adsorbed 
subunits can rearrange, in Figure [S^ we quantify the average residence time for the PS-PSR interaction (PS) and the 
Polyelectrolyte-ARM interaction (PE) within a completed capsid for a polymer with a varying number of HA or LA 
PS {eps = 20, bkpT). As expected, PS exchange is slow in comparison to exchange of PE interactions, with exchange 
of HA PSs significantly slower than exchange of LA PSs. (Eor HA PSs, exchange often occurs on longer timescales 
than our simulation times.) However, excess PSs (A^ps > 20) appear to facilitate exchange. 

In the second row of Eigure S6 we quantify the root mean squared fluctuations (RMSE) of the RNA within the 
capsid. We observe that increasing the number of PSs reduces the overall RNA dynamics. However, introduction of 
excess LA PSs leads to a rebound in the dynamics, further emphasizing the relationship between excess PSs and RNA 
rearrangements. In the third row, we quantify the average duration of paths within the capsid, which emphasizes the 
restriction of dynamics at A^ps = 20 HA PS. 

It is interesting to compare these different measures of dynamics in Eigure [Sb] Eor example, the first and third 
rows indicate an increase in dynamics for A^ps > 20, however this is not captured by the RMSE. We infer that the 
exchange of HA PS, even at excess A^ps, is a slower process, and thus their effect is obscured by other measures of 
RNA dynamics. However, this slow motion is still functionally relevant, as observed in the increase in yield with 
increasing HA A^ps- 
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FIG. S6. Different measures of RNA dynamics within an assembled capsid. The first row presents the average residence time 
for PS-binding site interactions and Polyelectrolyte-ARM interactions. The second row presents the average root mean squared 
fluctuations (averaged over time and RNA segments). The third row presents the average duration of RNA paths. These 
simulations are performed for a preassembled capsid containing a single RNA of optimal length with the specified iVps, with 
£PS = 20 A:bT for HA PS and sps = SOfeT for LA PS. 


C. Effect of PS on equilibrium encapsidation and substrate length 


In this section we describe the effect of PSs on the thermodynamically optimal RNA length for encapsidation 
defined as the length which minimizes the free energy of the RNA-capsid complex m- We found previously that 1/*^ 
corresponds to the length which optimizes finite-time assembly yields, at least within measured parameter ranges m- 
We also found that, when the model was adapted to match features of specific viruses, namely the interior volume 
of the capsid and the length and charge of the ARM amino acids, the calculated values of 1/*^ closely agree with the 
genome length for those viruses when effects of base pairing were included in the model m- 

Here, we present the effect of PSs on L*q. We calculate 1/*^ using the same protocol as in [El [44]; a very long RNA 
strand is placed in a preassembled capsid, with a small section of the capsid rendered permeable to the RNA. We then 
perform dynamics during which the RNA rearranges within the capsid and partially extrudes from the permeable 
section of the capsid. Once the system has equilibrated, 1/*^ is calculated as the average number of RNA segments 
remaining within the capsid, measured over a period of (5 * 10^ timesteps). Values of measured by this protocol 
were found to closely agree with the values which minimized the free energy calculated using the Widom test-particle 
method m as extended to calculate polymer residual chemical potentials [Eiiiaiia. 

The dependence of I/*q on the PS-PSR binding strength eps is shown in Fig. S7 for RNA with varying numbers 
of PSs. We report the frequency of PSs, defined as the number of polyelectrolyte segments between each PS (which 
are uniformly spaced), since the number of encapsidated PSs can depend on 1/*^. We see that, while a single PS has 
essentially no effect on L* , increasing the PS strength and frequency can significantly increase L* . Note that since 


the single PS has essentially no effect on 1/*^, the optimal length for the Combo sequence with 1 HA PS and 25 LA 
PSs is approximately 628 (relative to 575 for the noncognate). 

We previously found that dynamical assembly of complete capsids around uniform polyelectrolytes occurs only for 
polyelectrolyte lengths within about 10% of 1/*^. In Fig. IS^ we show the yield of well-formed capsids at the end of 
long but finite-time dynamical simulations for uniform polyelectrolyte and the Combo PS sequence. We see that the 
PSs slightly increase the optimal length, to approximately the same value as the thermodynamic 1/*^. While the 
distribution of yields as a function of polyelectrolyte lengths is slightly broader for the PS sequence than the uniform 
polyelectrolyte, variations are still limited to ^ 10%. This result corroborates the observation of Fig. that the 
increase in for the LA PSs is relatively small, suggesting that the optimal RNA length does not differ significantly 
between non-cognate and cognate RNAs. This result is consistent with the observation values of 1/*^ from a model 
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FIG. S7. (A) The thermodynamic optimal RNA length LJq is shown as a function of Csait with monovalent salt for a uniform 
polyelectrolyte (the non-cognate RNA). (B) Dependence of Lgq on PS-PSR binding strength 5ps, for RNAs with uniformly 
spaced PSs, with indicated numbers of RNA segments between each PS. The salt concentration is Csait = lOOmM. 


which did not account for PSs agreed with actual genome lengths. 



Polymer Length 


FIG. S8. Yield of well-formed capsids in dynamical assembly simulations as a function of length of a polyelectrolyte with no 
PSs (• symbols) or 1 HA + 20 LA PSs (■ symbols). Parameters are Sss = 5 /cbT for the uniform polyelectrolyte, Sss = 2 /cbT 
for the PS sequence, and Csait = lOOmM in both cases. 


D. Dependence of RNA path properties on ARM length 


Previous simulations showed that 1/*^ depends on charge and excluded volume of the capsid, as determined by the 
ARM charge, ARM length and capsid interior volume [19]. Here we present results from simulations of complete 
capsids with varying ARM charge, encapsidating RNA with Nps = 20 LA PS. The ARM charge determines the 
optimal length of the substrate m, which determines the spacing between the PS and the geometry of t he path 
within the capsid. Figure [S^ is a histogram showing the distribution of step lengths for the polymer path (sec SUB); 
i.e. a step length of 1 corresponds to stepping to a nearest neighbor vertex, while a step of length 2 corresponds to 
skipping one vertex. Interestingly, this distribution behaves non-monotonically; for short ARMs (2,3) stepping to 
the nearest neighbor is strongly favored, for ARM = 4 there is a clear shift towards a separation of 2, which then is 
reversed for ARM = 5. Our analysis here suggests that the structure of the capsid protein-substrate complex and 
the ability of the substrate to promote assembly is determined by many factors, including the capsid geometry and 
charge, substrate length and charge, and PS number, strength, and spacing. 
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FIG. S9. Histogram describing the conformation of encapsidated RNA with A^ps = 20 PSs for varying ARM lengths. Here 
the X-axis is distance between vertices traced in the polymer path; a step length of 1 indicates a step to a nearest-neighbor 
vertex, while a step length of 2 indicates a step which skips a vertex. For each ARM length, the optimal RNA length was 
used: ARM=2: 350 segments, ARM=3: 428 segments, ARM=4: 504 segments, ARM=5: 575 segments. This changes the PS 
spacing, and the frequency of step length. 


E. Simulation Movies 

Movie 1 - Direct competition simulation between a noncognate (red) and cognate (magenta) with one HA PS. 
Simulation conditions: Cgait = lOOmM, £ss = S/cpT, rgx = 1, excess subunits. 

Movie 2 - Direct competition simulation between a noncognate (red) and cognate (magenta) using the ‘Combo’ PS 
sequence A^ps = 20. Simulation conditions: Cgait = lOOmM, 5ss = 2 /cbT, rgx = 1, excess subunits. 

Movie 3 - Direct competition simulation between a noncognate (red) and cognate (magenta) using the ‘Combo’ PS 
sequence A^ps = 30. Simulation conditions: Cgait = 500mM, 5ss = b/cpT, rex = 1, limiting subunits. 







